A cleaner/more streamlined version of rotation_exploration.Rmd for assessing changes in clinical scores pre-post 7T glucest.

load packages

7T dates

Find the first 7T each participant is done to maximize length of potential follow-ups. Using glucest subject list extracted from .xls (list from David 1/13/22).

Note: doesn’t include scans past June 2019. Will need to update but probably won’t add many subj. with 1 yr post-7T clinical timepoints. Look at slack message from Arianna with subj she’s scanned and their latest clin assess to see if already included

## 'data.frame':    134 obs. of  3 variables:
##  $ BBLID      : int  12882 15305 15598 16422 16962 17491 17519 17523 17621 17622 ...
##  $ X7T_date   : chr  "" "" "" "" ...
##  $ ONM_7T_date: chr  "5/22/14" "2/20/14" "8/6/14" "11/25/14" ...
## 'data.frame':    134 obs. of  3 variables:
##  $ BBLID      : int  12882 15305 15598 16422 16962 17491 17519 17523 17621 17622 ...
##  $ X7T_date   : Date, format: NA NA ...
##  $ ONM_7T_date: Date, format: "2014-05-22" "2014-02-20" ...

Load All Data

Notes * HOW TO ID 22Q PPTS FROM PNC FOR REMOVAL?!?! * no date available for PNC dx, arbitrarily picking Jan 1 2000

## 
## FALSE 
##   269
## [1] 132

There are 132 (132) subjects who have 7T scans, excluding 22q pts ID’d from Kosha’s RedCap data

Diagnoses & Demographics

Won’t be the focus of our analyses but will be useful for grouping. Merging diagnoses from PNC and Kosha’s RedCap pull:

#first clean redcap dx source info
names(redcap)
##  [1] "BBLID"               "DODIAGNOSIS"         "HSTATUS"            
##  [4] "AXIS1_DX1"           "allaxis1desc"        "GASR_CURRENT"       
##  [7] "AGEONSET_AXIS1"      "AGEONSET_CLINICRISK" "dx_pscat"           
## [10] "timepoint"           "FROM_CAPA"           "FROM_DIGS"          
## [13] "FROM_SCID"           "FROM_FIGS"           "FROM_RECORDS"       
## [16] "FROM_MINI"           "FROM_SIPS"
sources <- c("FROM_CAPA", "FROM_DIGS", "FROM_SCID", "FROM_FIGS", "FROM_RECORDS", "FROM_MINI", "FROM_SIPS")
redcap[,sources] <- redcap[,sources] %>% 
  transmute_all(funs(ifelse(. == 1, deparse(substitute(.)), NA))) #replace coding with col name
## Warning: `funs()` was deprecated in dplyr 0.8.0.
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
redcap <- redcap %>% unite(dx_source, c(sources), na.rm = TRUE, remove=FALSE) #merge across - not pretty but gets the point across for now
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(sources)` instead of `sources` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
redcap_dx <- redcap %>%
  rename(bblid=BBLID, dx_date=DODIAGNOSIS, dx=dx_pscat) %>%
  mutate(dx_date=as.Date(dx_date)) %>%
  select (bblid, dx, dx_source, dx_date)
  

all_dx <- rbind(redcap_dx,goassess)%>% 
  subset(!(bblid %in% q22$BBLID)) #drop 22q sub
id_date <- all_dx %>% select(bblid, dx_date)
table(duplicated(id_date))#no duplicate rows, all ID and dx_date combos unique!
## 
## FALSE 
##  9781

Now combine dx with scan dates:

dx_scan_date <- right_join(all_dx, first7t, by = c("bblid" = "BBLID")) %>%
  mutate(bblid = as.factor(bblid))

length(unique(dx_scan_date$bblid))
## [1] 132
sum(is.na(dx_scan_date$dx)) #some subjects with scans have no diagnosis from Kosha's redcap or goassess
## [1] 24
dx_missing <- dx_scan_date %>% subset(is.na(dx), select = c(bblid, first_7t_date))
# write.csv(dx_missing, "missing_dx.csv", row.names=FALSE)

There are 24 (24) subj who have 7T scans but are missing diagnoses from redcap/PNC!

Seem to have some data for these ppts in spreadsheets David added/demographics from 7T list - reviewed with David

Proceeding for now with dx available, recoding factors for consistency between PNC and RedCap when possible:

table(dx_scan_date$dx)
## 
## noDSMdx      OP   other     pro      PS     psy      TD 
##     105      22      46      98      38      34      15
dx_scan_date <- dx_scan_date %>% mutate(dx = as.factor(dx))
dx_scan_date$dx <- fct_collapse(dx_scan_date$dx, "healthy" = c("noDSMdx", "TD"),
                                "other_dx" = c("OP", "other"),
                                "psych_spect" = c("psy", "PS", "pro")) #recode factors

#make new df with only baseline dx (closest to 7T)
#calc diff (negative values -> dx before 7t)
dx_scan_date$dx_diff = as.numeric(difftime(dx_scan_date$dx_date, dx_scan_date$first_7t_date, units = "days"))
#select one dx per subject with date closest to &t
dx_baseline <- dx_scan_date %>%
  mutate(abs_diff_in_days=abs(dx_diff)) %>%
  group_by(bblid) %>%
  arrange(abs_diff_in_days) %>%
  slice(c(1)) %>%
  ungroup() %>%
  select(!c(abs_diff_in_days))

summary(dx_baseline$dx_diff) #not great alignment between date of dx and date of scan
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -6957.0  -232.8   -99.5  -240.9   -21.0   839.0      24
ggplot(dx_baseline, aes(x=dx_diff, y=bblid, color=dx)) + 
  geom_point() +
  labs(title="Dx Relative to First 7T Scan", x="Days from 7T Scan") +
  theme(axis.text.y=element_blank(),
        axis.ticks.y=element_blank())
## Warning: Removed 24 rows containing missing values (geom_point).

Adding in demographics - will have dx from 7T list in separate column for now, since I’m not sure when/where they came from

#recode & rename variables for clarity
demo <- demo %>% select(-Sex_1M, -Ethnicity, -Age)
demo$sex <- recode(demo$sex, "1" = "M", "2" = "F")

#merge with first 7t list
dx_baseline <- left_join(dx_baseline, demo, by = c("bblid"="BBLID"))

#calculate age at first 7t
dx_baseline$age_scan = as.period(interval(start = dx_baseline$DOB, end = dx_baseline$first_7t_date))$year
dx_baseline <- dx_baseline %>% select(-DOB) #drop DOB since we don't need that PHI anymore

SIPS & SOPS

Some subjects have multiple assessments on one day (one from proband one from family, etc); need to be sure to pick only TYPE == "COMBINED" per discussion with David. If no combined, pick proband.

ADD SIPS DATA FROM GOASSESS!

#SIPS + first 7t and dx
sips_date <- inner_join(sips, dx_baseline, by = "bblid")
str(sips_date$bblid)
##  Factor w/ 134 levels "14528","15305",..: 2 3 3 4 5 5 5 6 7 7 ...
#calc diff (negative values -> dx before 7t)
sips_date$diff_in_days = as.numeric(difftime(sips_date$DOSIPS, sips_date$first_7t_date, units = "days"))
# drop data more than 1 yr prior to scan
sips_date <- subset(sips_date, diff_in_days >= -365)
str(sips_date$bblid) #no subj lost
##  Factor w/ 134 levels "14528","15305",..: 2 3 5 7 7 10 10 11 14 15 ...
min(sips_date$diff_in_days) #confirm
## [1] -354
#find subjects that have a baseline sips and sops up to 364 days post-7T 
clin364_sip <-  sips_date  %>% subset(diff_in_days <= 364, select = bblid) %>% unique()
#find subjects that have a followup sips sops >365 days post-7T 
clin365_sip <- sips_date %>% subset(diff_in_days >= 365, select = bblid) %>% unique()
#filter df on date
sips.sops_scan <- subset(sips_date, (bblid %in% clin365_sip$bblid & bblid %in% clin364_sip$bblid))

length(unique(sips.sops_scan$bblid))
## [1] 27
#if subj has more than one assessment for same day, select Combined 
sips.sops_scan <- sips.sops_scan %>% mutate(TYPE = as.factor(TYPE))
levels(sips.sops_scan$TYPE) 
## [1] "Collateral" "Combined"   "FP"         "IP"         "P"         
## [6] "proband"    "Proband"
sips.sops_scan$TYPE <- fct_collapse(sips.sops_scan$TYPE, "Proband" = c("Proband", "proband", "P", "FP", "IP")) #combine all proband
sips.sops_scan$TYPE <- factor(sips.sops_scan$TYPE, levels = c("Combined", "Proband", "Collateral"))

#group data to select correct type (defined as lowest ordered level)
sips.sops_scan.filt <- sips.sops_scan %>%
  group_by(bblid, DOSIPS) %>%
  arrange(TYPE) %>%
  slice(c(1)) %>%
  ungroup()

length(unique(sips.sops_scan.filt$bblid)) #confirm no subjects deleted
## [1] 27
skimr::skim(sips.sops_scan.filt)
Data summary
Name sips.sops_scan.filt
Number of rows 76
Number of columns 57
_______________________
Column type frequency:
character 13
Date 3
factor 7
numeric 34
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
date_7T 0 1.00 10 10 0 27 0
PROTOCOL 0 1.00 12 23 0 10 0
RATER 0 1.00 4 8 0 20 0
SPD 13 0.83 1 1 0 1 0
FHPI 11 0.86 1 1 0 2 0
PRODROMAL 23 0.70 1 1 0 2 0
CONSIDER 23 0.70 1 1 0 2 0
REVIEW 23 0.70 1 1 0 1 0
FIRST_DEG_SCZ 17 0.78 1 1 0 2 0
SIPS_SOURCEID 72 0.05 7 7 0 4 0
SOURCE_PROJECT 72 0.05 3 3 0 1 0
type2 0 1.00 7 10 0 4 0
dx_source 0 1.00 9 39 0 5 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
DOSIPS 0 1 2013-01-10 2021-10-06 2016-04-20 75
dx_date 0 1 2013-11-14 2018-09-21 2015-01-30 27
first_7t_date 0 1 2013-10-11 2018-11-30 2014-10-11 25

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
bblid 0 1 FALSE 27 109: 7, 999: 5, 105: 5, 820: 4
TYPE 0 1 FALSE 3 Pro: 63, Com: 12, Col: 1
dx 0 1 FALSE 3 psy: 51, hea: 16, oth: 9
Diagnostic_Group 0 1 FALSE 4 PRO: 49, TD/: 16, Oth: 6, PSY: 5
Race 0 1 FALSE 4 Bla: 59, Whi: 8, Mix: 5, Asi: 4
sex 0 1 FALSE 2 F: 44, M: 32
ethnicity 0 1 FALSE 2 2: 73, 1: 3

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
VISITNUM 4 0.95 1.07 0.31 1.0 1.00 1.0 1.00 3.0 ▇▁▁▁▁
P1 1 0.99 2.00 1.93 0.0 0.00 2.0 3.00 6.0 ▇▅▂▂▃
P2 1 0.99 1.44 1.60 0.0 0.00 1.0 3.00 6.0 ▇▂▂▁▁
P3 1 0.99 1.00 1.57 0.0 0.00 0.0 2.00 6.0 ▇▁▂▁▁
P4 1 0.99 1.52 1.93 0.0 0.00 0.0 3.00 6.0 ▇▁▂▁▂
P5 1 0.99 1.64 1.63 0.0 0.00 2.0 3.00 6.0 ▇▃▂▂▁
N1 1 0.99 1.13 1.41 0.0 0.00 1.0 2.00 5.0 ▇▃▁▁▁
N2 1 0.99 0.64 1.06 0.0 0.00 0.0 1.00 4.0 ▇▂▁▁▁
N3 1 0.99 0.81 1.14 0.0 0.00 0.0 1.00 4.0 ▇▂▂▁▁
N4 1 0.99 0.49 1.14 0.0 0.00 0.0 0.00 5.0 ▇▁▁▁▁
N5 1 0.99 0.79 1.13 0.0 0.00 0.0 1.00 4.0 ▇▂▂▁▁
N6 1 0.99 1.44 2.21 0.0 0.00 0.0 3.00 6.0 ▇▁▁▁▂
D1 1 0.99 0.60 1.10 0.0 0.00 0.0 1.00 5.0 ▇▁▁▁▁
D2 1 0.99 0.87 1.52 0.0 0.00 0.0 1.00 6.0 ▇▁▁▁▁
D3 2 0.97 0.91 1.25 0.0 0.00 0.0 2.00 4.0 ▇▂▂▂▁
D4 1 0.99 0.37 0.82 0.0 0.00 0.0 0.00 4.0 ▇▁▁▁▁
G1 1 0.99 1.08 1.62 0.0 0.00 0.0 2.00 6.0 ▇▁▂▁▁
G2 1 0.99 1.05 1.55 0.0 0.00 0.0 2.00 6.0 ▇▂▁▁▁
G3 1 0.99 0.00 0.00 0.0 0.00 0.0 0.00 0.0 ▁▁▇▁▁
G4 1 0.99 0.79 1.27 0.0 0.00 0.0 1.00 6.0 ▇▁▁▁▁
GAF_C 10 0.87 64.76 18.01 21.0 51.00 65.0 80.75 95.0 ▁▆▇▇▇
GAF_H 15 0.80 67.59 16.64 21.0 53.00 70.0 81.00 95.0 ▁▅▆▇▇
POS_345 23 0.70 1.32 1.49 0.0 0.00 1.0 2.00 5.0 ▇▃▁▁▁
POS_6 23 0.70 0.19 0.74 0.0 0.00 0.0 0.00 4.0 ▇▁▁▁▁
NEG_3456 23 0.70 0.77 0.97 0.0 0.00 1.0 1.00 4.0 ▇▆▂▁▁
DIS_3456 23 0.70 0.53 0.80 0.0 0.00 0.0 1.00 3.0 ▇▃▁▁▁
GEN_3456 23 0.70 0.47 0.80 0.0 0.00 0.0 1.00 3.0 ▇▂▁▁▁
GAF_PCTCHG 14 0.82 -0.60 3.50 -25.0 0.00 0.0 0.00 0.0 ▁▁▁▁▇
POSS_PRO_DX 48 0.37 348.34 0.12 348.1 348.30 348.4 348.40 348.4 ▂▁▁▁▇
timepoint 0 1.00 5.36 3.77 1.0 3.00 4.5 6.25 19.0 ▇▆▁▁▁
ntimepoints 0 1.00 7.03 4.60 2.0 4.00 6.0 7.25 19.0 ▇▅▁▁▂
dx_diff 0 1.00 14.24 146.00 -259.0 -71.00 -39.5 192.00 269.0 ▁▇▃▁▆
age_scan 0 1.00 19.50 3.17 15.0 16.00 19.0 22.00 26.0 ▇▃▅▃▂
diff_in_days 0 1.00 574.83 777.89 -304.0 -55.25 392.5 958.75 2677.0 ▇▃▃▁▁

SOPS

SOPS seems to be the 12 item components of SIPS (which are overall ratings of severity). Focusing on SOPS (summing within and subscales Pos, Neg, Gen, Disorg) - seem to be valid scoring based on ct.gov.

#filter for SOPS completion
sops_scan <- sips.sops_scan.filt %>% 
  drop_na(P1) %>% 
  select(c(1, 7:26, 47:57)) #subset columns

#since dropping NA we need to refilter pre-post subj
##find subjects that have a baseline sops up to 364 days post-7T 
clin364_sop <-  sops_scan  %>% subset(diff_in_days <= 364, select = bblid) %>% unique()
##find subjects that have a followup sips sops >365 days post-7T 
clin365_sop <- sops_scan %>% subset(diff_in_days >= 365, select = bblid) %>% unique()
##filter df
sops_scan <- subset(sops_scan, (bblid %in% clin365_sop$bblid & bblid %in% clin364_sop$bblid))

sum(is.na(sops_scan[3:21]))#see missing sops scores
## [1] 1
sops_scan <- sops_scan %>% mutate(sops_p = rowSums(across(3:7), na.rm=F),
                                  sops_n = rowSums(across(8:13), na.rm=F),
                                  sops_d = rowSums(across(14:17), na.rm=F),
                                  sops_g = rowSums(across(c(G1, G2, G3, G4)), na.rm=F), #index not working for some unknown and horribly frustrating reason
                                  sops_tot = rowSums(across(c(sops_p, sops_n, sops_d, sops_g), na.rm=F)))

sum(is.na(sops_scan$sops_tot))
## [1] 1
#plot for fun
ggplot(sops_scan, aes(x=diff_in_days, y=bblid, color=sops_tot)) + 
  geom_point() +
  geom_vline(xintercept=365) +
  labs(title="SOPS Relative to First 7T Scan", x="Days from 7T Scan") +
  theme(legend.position = "none")

length(unique(sops_scan$bblid))
## [1] 27
sops_tot <- sops_scan %>% drop_na(sops_tot)
length(unique(sops_tot$bblid))
## [1] 27

Dropping subjects with missing items for now, could be used for subscale scores

Looking just at change immediately pre-post 7T:

There are 27 (27) subjects with SOPS at baseline and at least 1 post-1 yr followup, though total pre-post scores are available only for 26 subjects. Negative change -> ppt’s score is getting lower (i.e. symptoms improving)

##      bblid              dx      sops_tot_pre   sops_tot_post  
##  17622  : 1   healthy    : 7   Min.   : 0.00   Min.   : 0.00  
##  18093  : 1   other_dx   : 2   1st Qu.: 4.25   1st Qu.: 7.00  
##  82051  : 1   psych_spect:17   Median :22.50   Median :16.00  
##  83835  : 1                    Mean   :21.65   Mean   :18.92  
##  85369  : 1                    3rd Qu.:31.75   3rd Qu.:31.25  
##  87225  : 1                    Max.   :74.00   Max.   :57.00  
##  (Other):20                                                   
##    change_pos         change_neg        change_dis         change_gen     
##  Min.   :-13.0000   Min.   :-17.000   Min.   :-12.0000   Min.   :-7.0000  
##  1st Qu.: -4.5000   1st Qu.: -2.750   1st Qu.: -3.0000   1st Qu.:-1.0000  
##  Median :  0.0000   Median : -1.000   Median : -0.5000   Median : 0.0000  
##  Mean   : -0.2692   Mean   : -1.615   Mean   : -0.9615   Mean   : 0.1154  
##  3rd Qu.:  3.2500   3rd Qu.:  1.000   3rd Qu.:  0.0000   3rd Qu.: 0.7500  
##  Max.   : 15.0000   Max.   :  7.000   Max.   :  7.0000   Max.   : 8.0000  
##                                                                           
##    change_tot     
##  Min.   :-45.000  
##  1st Qu.: -8.000  
##  Median : -1.500  
##  Mean   : -2.731  
##  3rd Qu.:  7.000  
##  Max.   : 24.000  
## 

Line plot of all SOPS Total for each subj:

## Warning: Removed 1 row(s) containing missing values (geom_path).

SIPS

Pulling “SIPS” ratings number of Pos, Neg, Gen, etc items on SOPS coded as 3,4,5, or 6 depending on suffix - calculating for subj that aren’t already input. Not planning to use these in final analyses, but already wrote the code.

# #filter df for SOPS item completion
# sips_scan <- dx_sips.sops_scan %>% 
#   drop_na(POS_345, POS_6, NEG_3456, GEN_3456) %>% 
#   mutate(POS_3456= rowSums(across(c(POS_345, POS_6)))) #concatenating across frequency of p sxs for consistency across subscales

sips_calc <- sips.sops_scan.filt %>%
  drop_na(8:26)
#calc sips
sips_calc$sips_pos <- apply(sips_calc[8:12], 1, function(x) length(which(x>2)))
sips_calc$sips_neg <- apply(sips_calc[13:18], 1, function(x) length(which(x>2)))
sips_calc$sips_dis <- apply(sips_calc[19:22], 1, function(x) length(which(x>2)))
sips_calc$sips_gen <- apply(sips_calc[23:26], 1, function(x) length(which(x>2)))
sips_scan <- sips_calc %>% select(1, 7, 47:61)

#since dropping NA we need to refilter pre-post subj
##find subjects that have a baseline sops up to 364 days post-7T 
clin364_sip2 <-  sips_scan  %>% subset(diff_in_days <= 364, select = bblid) %>% unique()
##find subjects that have a followup sips sops >365 days post-7T 
clin365_sip2 <- sips_scan %>% subset(diff_in_days >= 365, select = bblid) %>% unique()
##filter df
sips_scan <- subset(sips_scan, (bblid %in% clin364_sip2$bblid & bblid %in% clin365_sip2$bblid))

length(unique(sips_scan$bblid))
## [1] 26

There are 26 (26) subjects with SIPS at baseline and at least 1 post-1 yr followup

Calculate change across time (i.e. change from one timepoint to the next):

#calculate change in SIPS scores, save and plot
sips_change <- sips_scan %>% 
  group_by(bblid) %>% 
  arrange(diff_in_days) %>%
  mutate("pos_diff"= sips_pos - lag(sips_pos),
         "neg_diff"= sips_neg - lag(sips_neg),
         "gen_diff"= sips_gen -lag(sips_gen),
         "dis_diff"= sips_dis - lag(sips_dis)) %>%
  ungroup()

summary(sips_change)
##      bblid        DOSIPS                     dx      dx_source        
##  109577 : 6   Min.   :2013-01-10   healthy    :16   Length:73         
##  99949  : 5   1st Qu.:2014-12-20   other_dx   : 7   Class :character  
##  105176 : 5   Median :2016-03-19   psych_spect:50   Mode  :character  
##  82051  : 4   Mean   :2016-12-08                                      
##  139272 : 4   3rd Qu.:2018-06-29                                      
##  83835  : 3   Max.   :2021-10-06                                      
##  (Other):46                                                           
##     dx_date           first_7t_date           dx_diff        Diagnostic_Group
##  Min.   :2013-11-14   Min.   :2013-10-11   Min.   :-259.00   Other  : 6      
##  1st Qu.:2014-08-27   1st Qu.:2014-07-16   1st Qu.: -71.00   PRO/CHR:46      
##  Median :2015-01-19   Median :2014-09-25   Median : -43.00   PSY    : 5      
##  Mean   :2015-05-22   Mean   :2015-05-11   Mean   :  11.53   TD/NC  :16      
##  3rd Qu.:2015-04-17   3rd Qu.:2015-08-06   3rd Qu.: 192.00                   
##  Max.   :2018-09-21   Max.   :2018-11-30   Max.   : 269.00                   
##                                                                              
##                      Race    sex    ethnicity    age_scan      diff_in_days   
##  Asian                 : 4   M:30   1: 3      Min.   :15.00   Min.   :-304.0  
##  Black/African American:56   F:43   2:70      1st Qu.:16.00   1st Qu.: -56.0  
##  Mixed                 : 5                    Median :19.00   Median : 387.0  
##  White                 : 8                    Mean   :19.55   Mean   : 577.7  
##                                               3rd Qu.:22.00   3rd Qu.:1000.0  
##                                               Max.   :26.00   Max.   :2677.0  
##                                                                               
##     sips_pos        sips_neg         sips_dis         sips_gen     
##  Min.   :0.000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.000   Median :1.0000   Median :0.0000   Median :0.0000  
##  Mean   :1.479   Mean   :0.7808   Mean   :0.4795   Mean   :0.5068  
##  3rd Qu.:2.000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :5.000   Max.   :4.0000   Max.   :3.0000   Max.   :3.0000  
##                                                                    
##     pos_diff          neg_diff          gen_diff           dis_diff      
##  Min.   :-3.0000   Min.   :-4.0000   Min.   :-2.00000   Min.   :-2.0000  
##  1st Qu.:-1.0000   1st Qu.: 0.0000   1st Qu.: 0.00000   1st Qu.:-1.0000  
##  Median : 0.0000   Median : 0.0000   Median : 0.00000   Median : 0.0000  
##  Mean   :-0.1702   Mean   :-0.1915   Mean   :-0.06383   Mean   :-0.1702  
##  3rd Qu.: 1.0000   3rd Qu.: 0.0000   3rd Qu.: 0.00000   3rd Qu.: 0.0000  
##  Max.   : 4.0000   Max.   : 1.0000   Max.   : 1.00000   Max.   : 2.0000  
##  NA's   :26        NA's   :26        NA's   :26         NA's   :26
#plot longitudinal scores
ggplot(sips_change, aes(x=diff_in_days, y=sips_pos, color=bblid)) +
  geom_line() +
  labs(title="SIPS Pos")

ggplot(sips_change, aes(x=diff_in_days, y=sips_neg, color=bblid)) +
  geom_line() +
  labs(title="SIPS Neg")

ggplot(sips_change, aes(x=diff_in_days, y=sips_gen, color=bblid)) +
  geom_line() +
  labs(title="SIPS Gen")

ggplot(sips_change, aes(x=diff_in_days, y=sips_dis, color=bblid)) +
  geom_line() +
  labs(title="SIPS Dis")

#plot change
ggplot(sips_change, aes(x=pos_diff)) + 
  geom_histogram(color="white", fill= "darkgreen") + 
  labs(title="Change in SIPS Pos")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 26 rows containing non-finite values (stat_bin).

ggplot(sips_change, aes(x=neg_diff)) + 
  geom_histogram(color="white", fill= "darkgreen") + 
  labs(title="Change in SIPS Neg")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 26 rows containing non-finite values (stat_bin).

ggplot(sips_change, aes(x=gen_diff)) + 
  geom_histogram(color="white", fill= "darkgreen") + 
  labs(title="Change in SIPS Gen")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 26 rows containing non-finite values (stat_bin).

Looking at just change immediately pre-post 7T

#define baseline - scan closest to 7T
#find smallest datediff that's >=365
post <- sips_scan %>%
  select(1, 2, 13:17) %>% #filter so we're not duplicating rows that wont change (demographics, 7t date, etc.)
  group_by(bblid) %>%
  filter(diff_in_days >= 365) %>%
  arrange(bblid, diff_in_days) %>%
  slice(c(1))%>% 
  rename_with( ~ paste0(.x, "_post"))

#find datediff that's <365 and closest to 0
pre <- sips_scan %>%
  group_by(bblid) %>%
  filter(diff_in_days < 365) %>%
  mutate(abs_diff_in_days=abs(diff_in_days)) %>%
  arrange(bblid, abs_diff_in_days) %>%
  slice(c(1)) %>% 
  select(!c(abs_diff_in_days)) %>% 
  rename_at(c(2, 13:17), ~ paste0(.x, "_pre"))

#new dataframe
proximal_sips <- inner_join(pre, post, by = c("bblid" = "bblid_post")) %>%
  mutate(change_pos = (sips_pos_post - sips_pos_pre),
         change_neg = (sips_neg_post - sips_neg_pre),
         change_gen = (sips_gen_post - sips_gen_pre))

#plot change
ggplot(proximal_sips, aes(x=change_pos)) + 
  geom_bar(color="white", fill= "darkgreen") + 
  labs(title="Change in SIPS Pos")

ggplot(proximal_sips, aes(x=change_neg)) + 
  geom_bar(color="white", fill= "darkgreen") + 
  labs(title="Change in SIPS Neg")

ggplot(proximal_sips, aes(x=change_gen)) + 
  geom_bar(color="white", fill= "darkgreen") + 
  labs(title="Change in SIPS Gen")

PRIME

Prime Screen scores were computed by totaling the number of items endorsed at the level of 5 or 6 and via raw sum of scores(SIP003 to SIP014) (per Zak’s paper and scale authors). Combining data from Kosha and David, keeping proband over collateral reports when one subject has both on the same day.

#add prime scores from david's spreadsheets
syrp_prime <- syrp %>%
  select(5, 7, 10, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35) %>%
  rename(sip003=prime_1, sip004=prime_2, sip005=prime_3, sip006=prime_4, sip007=prime_5, sip008=prime_6, sip009=prime_7, sip010=prime_8, sip011=prime_9, sip012=prime_10, sip013=prime_11, sip014=prime_12, source=admin_proband) %>% #rename to match redcap prime items
  mutate(bblid=as.character(bblid)) #setting back to character for now for rbind
prime_s <- prime %>% 
  mutate(admin_proband = str_detect(redcapid, "P"), 
         admin_collat = str_detect(redcapid, "C"),
         source = case_when(admin_proband == TRUE ~ "p",
                           admin_collat == TRUE ~ "c")) %>%
    select(1, 5, 75, 8:19) %>% #subset columns
    mutate(bblid=as.character(bblid)) #setting back to character for now for rbind

#combine both prime data sources
all_prime <- rbind(syrp_prime,prime_s) %>%
   drop_na(4:15) %>% #drop rows with missing values
   mutate (bblid=as.factor(bblid)) %>% #set bblid back to factor
   distinct() #keep only unique rows

#check for multiple assessments per subj per day
id_date <- all_prime %>% select(bblid, date)
table(duplicated(id_date))
## 
## FALSE  TRUE 
##   231    19
#select proband assessment if there are two proband & collateral assesments on one day
all_prime$source <- factor(all_prime$source, ordered = TRUE, levels=c("p", "c"))
prime_p <- all_prime %>%
  group_by(bblid, date) %>% #select only the first instance of a specific bblid on a specific day ~ proband
  arrange(source) %>%
  slice(c(1)) %>%
  ungroup()

#join with 7T dates & baseline dx
prime_date <- left_join(prime_p, dx_baseline, by = c("bblid" = "bblid")) 

#calc diff (negative values -> prime before 7t)
prime_date$diff_in_days = as.numeric(difftime(prime_date$date, prime_date$first_7t_date, units = "days"))

# drop data more than 1 yr prior to scan
prime_dx <- subset(prime_date, diff_in_days >= -365)
str(prime_date$bblid) #no subj lost
##  Factor w/ 140 levels "102041","105176",..: 1 1 1 2 2 2 3 3 3 4 ...
min(prime_date$diff_in_days) #confirm datediff filtered
## [1] NA
#find subjects that have a baseline sips and sops up to 364 days post-7T 
clin364_pr <-  prime_dx %>% subset(diff_in_days <= 364, select = bblid) %>% unique()
#find subjects that have a followup sips sops >365 days post-7T 
clin365_pr <- prime_dx %>% subset(diff_in_days >= 365, select = bblid) %>% unique()
#filter df for appropriate timepoints
prime_scan <- subset(prime_dx, (bblid %in% clin365_pr$bblid & bblid %in% clin364_pr$bblid)) %>% 
  mutate(prime_sum = rowSums(across(4:15))) #calculate raw sum across items
  
#calculate prime_scores
prime_scan$count.5 <- apply(prime_scan[4:15], 1, function(x) length(which(x==5)))
prime_scan$count.6 <- apply(prime_scan[4:15], 1, function(x) length(which(x==6)))
prime_scan$prime_score <- rowSums(cbind(prime_scan$count.5, prime_scan$count.6))

#plot for fun
ggplot(prime_scan, aes(x=diff_in_days, y=bblid, color=prime_score)) +
  geom_point() +
  geom_vline(xintercept=365) +
  labs(title="PRIME Score Relative to First 7T Scan", x="Days from 7T Scan")

length(unique(prime_scan$bblid))
## [1] 26
summary(prime_scan)
##      bblid         date             source       sip003        sip004      
##  83835  : 5   Min.   :2014-10-30   p   :54   Min.   :0.0   Min.   :0.0000  
##  117397 : 4   1st Qu.:2016-12-17   c   : 0   1st Qu.:0.0   1st Qu.:0.0000  
##  121407 : 4   Median :2018-04-02   NA's:16   Median :0.0   Median :0.0000  
##  89279  : 4   Mean   :2018-03-24             Mean   :0.9   Mean   :0.5429  
##  109577 : 3   3rd Qu.:2019-03-15             3rd Qu.:1.0   3rd Qu.:0.0000  
##  112126 : 3   Max.   :2022-01-20             Max.   :6.0   Max.   :6.0000  
##  (Other):47                                                                
##      sip005        sip006           sip007        sip008           sip009      
##  Min.   :0.0   Min.   :0.0000   Min.   :0.0   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0   1st Qu.:0.0000   1st Qu.:0.0   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0   Median :0.0000   Median :0.0   Median :0.0000   Median :0.0000  
##  Mean   :0.3   Mean   :0.8571   Mean   :0.9   Mean   :0.4286   Mean   :0.2714  
##  3rd Qu.:0.0   3rd Qu.:1.0000   3rd Qu.:1.0   3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :5.0   Max.   :6.0000   Max.   :6.0   Max.   :6.0000   Max.   :6.0000  
##                                                                                
##      sip010           sip011           sip012           sip013      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.5143   Mean   :0.6571   Mean   :0.6143   Mean   :0.5571  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :6.0000   Max.   :5.0000   Max.   :6.0000   Max.   :6.0000  
##                                                                     
##      sip014                 dx      dx_source            dx_date          
##  Min.   :0.0000   healthy    :26   Length:70          Min.   :2014-06-30  
##  1st Qu.:0.0000   other_dx   : 6   Class :character   1st Qu.:2015-01-19  
##  Median :0.0000   psych_spect:36   Mode  :character   Median :2015-04-29  
##  Mean   :0.4286   NA's       : 2                      Mean   :2016-04-11  
##  3rd Qu.:0.0000                                       3rd Qu.:2018-02-16  
##  Max.   :6.0000                                       Max.   :2020-10-02  
##                                                       NA's   :2           
##  first_7t_date           dx_diff         Diagnostic_Group
##  Min.   :2014-03-07   Min.   :-424.000   Other  : 7      
##  1st Qu.:2014-08-19   1st Qu.:-131.000   PRO/CHR:29      
##  Median :2015-08-11   Median : -50.500   PSY    : 6      
##  Mean   :2016-05-05   Mean   :   8.765   TD/NC  :28      
##  3rd Qu.:2018-03-30   3rd Qu.: 193.500                   
##  Max.   :2019-05-10   Max.   : 616.000                   
##                       NA's   :2                          
##                      Race    sex    ethnicity    age_scan      diff_in_days   
##  Asian                 : 0   M:27   1: 2      Min.   :15.00   Min.   :-294.0  
##  Black/African American:52   F:43   2:68      1st Qu.:19.00   1st Qu.:   0.0  
##  Mixed                 : 2                    Median :20.00   Median : 546.0  
##  White                 :16                    Mean   :20.24   Mean   : 688.9  
##                                               3rd Qu.:22.00   3rd Qu.:1265.2  
##                                               Max.   :26.00   Max.   :2746.0  
##                                                                               
##    prime_sum         count.5          count.6        prime_score    
##  Min.   : 0.000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.: 0.000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median : 1.000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   : 6.971   Mean   :0.2286   Mean   :0.2857   Mean   :0.5143  
##  3rd Qu.: 9.750   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :47.000   Max.   :3.0000   Max.   :4.0000   Max.   :4.0000  
## 

There are 26 (26) subjects with PRIME scores at baseline and at least 1 post-1 yr followup.

Look at just change immediately pre-post 7T. Check calculations for percent change make sense!

#define baseline - scan closest to 7T
#find smallest datediff that's >=365
post_prime <- prime_scan %>%
  select(1:15, 26:30) %>% #filter so we're not duplicating rows that wont change (demographics, 7t date, etc.)
  group_by(bblid) %>%
  filter(diff_in_days >= 365) %>%
  arrange(bblid, diff_in_days) %>%
  slice(c(1))%>% 
  rename_with( ~ paste0(.x, "_post"))

#find datediff that's <365 and closest to 0
pre_prime <- prime_scan %>%
  group_by(bblid) %>%
  filter(diff_in_days < 365) %>%
  mutate(abs_diff_in_days=abs(diff_in_days)) %>%
  arrange(bblid, abs_diff_in_days) %>%
  slice(c(1)) %>% 
  select(!c(abs_diff_in_days)) %>% 
  rename_at(c(2:15, 26:30), ~ paste0(.x, "_pre"))


#new dataframe
proximal_prime <- inner_join(pre_prime, post_prime, by = c("bblid" = "bblid_post")) %>%
  mutate(change_prime_sum = (prime_sum_post - prime_sum_pre),#calculate change
         change_prime = (prime_score_post - prime_score_pre),
         p.change_prime_sum = ((change_prime_sum+1e-10)/(prime_sum_pre+1e-10))*100, #calculating percent change, adding .00000000001 to each to keep from dividing by zero
         p.change_prime = ((change_prime+1e-10)/(prime_score_pre+1e-10))*100,
         p.change_prime_sum = round(p.change_prime_sum, digits=1), #rounding to undo small addition
         p.change_prime = round(p.change_prime, digits=1)) 
#calculating days between prime timepoints
proximal_prime$prime_date_diff = as.numeric(difftime(proximal_prime$date_post, proximal_prime$date_pre, units = "days"))

proximal_prime %>% select(dx, prime_sum_pre, prime_score_pre, prime_sum_post, prime_score_post, change_prime_sum, change_prime, p.change_prime_sum, p.change_prime, prime_date_diff) %>% summary()
## Adding missing grouping variables: `bblid`
##      bblid              dx     prime_sum_pre    prime_score_pre 
##  109577 : 1   healthy    : 9   Min.   : 0.000   Min.   :0.0000  
##  112126 : 1   other_dx   : 2   1st Qu.: 0.000   1st Qu.:0.0000  
##  116019 : 1   psych_spect:14   Median : 3.500   Median :0.0000  
##  116354 : 1   NA's       : 1   Mean   : 9.346   Mean   :0.7308  
##  117397 : 1                    3rd Qu.:12.250   3rd Qu.:0.0000  
##  120217 : 1                    Max.   :47.000   Max.   :4.0000  
##  (Other):20                                                     
##  prime_sum_post   prime_score_post change_prime_sum   change_prime    
##  Min.   : 0.000   Min.   :0.0000   Min.   :-35.000   Min.   :-4.0000  
##  1st Qu.: 0.000   1st Qu.:0.0000   1st Qu.: -4.750   1st Qu.: 0.0000  
##  Median : 0.000   Median :0.0000   Median : -0.500   Median : 0.0000  
##  Mean   : 5.962   Mean   :0.4231   Mean   : -3.385   Mean   :-0.3077  
##  3rd Qu.: 9.000   3rd Qu.:0.0000   3rd Qu.:  0.000   3rd Qu.: 0.0000  
##  Max.   :32.000   Max.   :4.0000   Max.   : 12.000   Max.   : 1.0000  
##                                                                       
##  p.change_prime_sum   p.change_prime   prime_date_diff
##  Min.   :-1.000e+02   Min.   :-100.0   Min.   : 385   
##  1st Qu.:-1.000e+02   1st Qu.: 100.0   1st Qu.: 729   
##  Median :-1.200e+01   Median : 100.0   Median :1087   
##  Mean   : 6.538e+11   Mean   :  65.7   Mean   :1068   
##  3rd Qu.: 1.000e+02   3rd Qu.: 100.0   3rd Qu.:1179   
##  Max.   : 1.200e+13   Max.   : 100.0   Max.   :2458   
## 
#plots
ggplot(proximal_prime, aes(x=change_prime)) + 
  geom_histogram(color="white", fill= "darkgreen") + 
  labs(title="Change in PRIME Score")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(proximal_prime, aes(x=change_prime_sum)) + 
  geom_histogram(color="white", fill= "darkgreen") + 
  labs(title="Change in PRIME Sums")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Very little change in PRIME score (counts of 5 and 6 rated sxs), more variability across score sums. Negative change -> ppt’s score is getting lower (i.e. symptoms improving).

Plotting PRIME with dx ~ not all points visible due to overlap

#plot absolute change between proximal PRIME scores
ggplot(proximal_prime, aes(x=change_prime_sum, y=prime_sum_pre, color=dx)) +
  geom_point() +
  geom_text_repel(aes(label = bblid), max.overlaps = 30) +
  labs(title="Change in PRIME Sums Over Time", x="Change (Post - Pre)", y="Baseline PRIME Total")

#plot absolute change vs time between proximal GAF Cs
ggplot(proximal_prime, aes(x=prime_date_diff, y=change_prime_sum, color=dx)) +
  geom_point() +
  geom_text_repel(aes(label = bblid), max.overlaps = 30) +
  labs(title="Change in PRIME By Delta Time", x="Time Between PRIME Assessments (Days)", y="Change PRIME (Post - Pre)")

GAF

GAFC

Looking through SIPS GAF current

## [1] 22
## [1] 0

There are 27 (22) subjects with GAFC at baseline and at least 1 post-1 yr followup. Looking at scores across time:

#plot longitudinal scores
ggplot(gafc_scan, aes(x=diff_in_days, y=GAF_C, color=bblid)) +
  geom_line() +
  labs(title="GAFC")

Look at absolute change from one timepoint to the next:

#calculate change in GAFC scores
gafc_change <- gafc_scan %>% 
  group_by(bblid) %>% 
  arrange(diff_in_days) %>%
  mutate("gafc_diff"= GAF_C - lag(GAF_C))

summary(gafc_change)
##      bblid        DOSIPS               GAF_C       first_7t_date       
##  109577 : 6   Min.   :2013-01-10   Min.   :21.00   Min.   :2013-10-11  
##  82051  : 4   1st Qu.:2014-12-19   1st Qu.:50.00   1st Qu.:2014-07-20  
##  99949  : 4   Median :2015-11-14   Median :60.50   Median :2014-10-27  
##  139272 : 4   Mean   :2016-07-05   Mean   :63.13   Mean   :2015-05-26  
##  85369  : 3   3rd Qu.:2018-03-11   3rd Qu.:78.25   3rd Qu.:2015-08-11  
##  87225  : 3   Max.   :2020-10-26   Max.   :95.00   Max.   :2018-11-30  
##  (Other):36                                                            
##   diff_in_days               dx      dx_source            dx_date          
##  Min.   :-304.00   healthy    : 8   Length:60          Min.   :2013-11-14  
##  1st Qu.: -61.25   other_dx   : 6   Class :character   1st Qu.:2014-07-31  
##  Median : 259.50   psych_spect:46   Mode  :character   Median :2015-01-30  
##  Mean   : 406.13                                       Mean   :2015-05-15  
##  3rd Qu.: 737.75                                       3rd Qu.:2015-06-12  
##  Max.   :1648.00                                       Max.   :2018-09-21  
##                                                                            
##     dx_diff          gafc_diff      
##  Min.   :-259.00   Min.   :-38.000  
##  1st Qu.: -71.00   1st Qu.: -8.750  
##  Median : -50.00   Median :  0.500  
##  Mean   : -11.03   Mean   :  1.526  
##  3rd Qu.:  34.00   3rd Qu.: 12.500  
##  Max.   : 221.00   Max.   : 35.000  
##                    NA's   :22

Negative change -> ppt’s score is getting lower (i.e. symptoms worstening). Look at absolute change immediately pre-post 7T

#define baseline - scan closest to 7T
#find smallest datediff that's >=365
post_gafc <- gafc_scan %>%
  select(1:3, 5) %>% #filter so we're not duplicating rows that wont change (demographics, 7t date, etc.)
  group_by(bblid) %>%
  filter(diff_in_days >= 365) %>%
  arrange(bblid, diff_in_days) %>%
  slice(c(1))%>% 
  rename_with( ~ paste0(.x, "_post")) %>%
  ungroup()

#find datediff that's <365 and closest to 0
pre_gafc <- gafc_scan %>%
  group_by(bblid) %>%
  filter(diff_in_days < 365) %>%
  mutate(abs_diff_in_days=abs(diff_in_days)) %>%
  arrange(bblid, abs_diff_in_days) %>%
  slice(c(1)) %>% 
  select(!c(abs_diff_in_days)) %>% 
  rename_at(c(2,3,5), ~ paste0(.x, "_pre")) %>%
  ungroup()

#new dataframe
proximal_gafc <- inner_join(pre_gafc, post_gafc, by = c("bblid" = "bblid_post")) %>%
  mutate(change_gafc = (GAF_C_post - GAF_C_pre), #absolute change
         p.change_gafc = (change_gafc/GAF_C_pre)*100) #percent change
#calculating days between gafc timepoints
proximal_gafc$gafc_date_diff = as.numeric(difftime(proximal_gafc$DOSIPS_post, proximal_gafc$DOSIPS_pre, units = "days"))

#plot change
ggplot(proximal_gafc, aes(x=change_gafc)) + 
  geom_bar(color="white", fill= "darkgreen") + 
  labs(title="Change in GAF Current Immediately Pre-Post 7T")

proximal_gafc %>% select(6, 3, 11, 13:15) %>% summary()
##            dx       GAF_C_pre       GAF_C_post     change_gafc     
##  healthy    : 4   Min.   :21.00   Min.   :21.00   Min.   :-38.000  
##  other_dx   : 2   1st Qu.:48.50   1st Qu.:50.25   1st Qu.: -0.750  
##  psych_spect:16   Median :55.00   Median :69.00   Median :  4.000  
##                   Mean   :59.36   Mean   :65.14   Mean   :  5.773  
##                   3rd Qu.:76.00   3rd Qu.:80.50   3rd Qu.: 16.000  
##                   Max.   :90.00   Max.   :95.00   Max.   : 35.000  
##  p.change_gafc     gafc_date_diff  
##  Min.   :-43.243   Min.   : 250.0  
##  1st Qu.: -1.471   1st Qu.: 645.8  
##  Median :  7.131   Median : 879.0  
##  Mean   : 13.739   Mean   : 938.5  
##  3rd Qu.: 28.508   3rd Qu.:1240.8  
##  Max.   : 81.395   Max.   :1698.0

Plotting GAF C with dx

#plot absolute change vs baseline GAF C
ggplot(proximal_gafc, aes(x=change_gafc, y=GAF_C_pre, color=dx)) +
  geom_point() +
  geom_text_repel(aes(label = bblid), max.overlaps = 30) +
  labs(title="Change in Current GAF Over Time", x="Change (Post - Pre)", y="Baseline GAF C")

#plot absolute change vs time between proximal GAF Cs
ggplot(proximal_gafc, aes(x=gafc_date_diff, y=change_gafc, color=dx)) +
  geom_point() +
  geom_text_repel(aes(label = bblid), max.overlaps = 30) +
  labs(title="Change in Current GAF By Delta Time", x="Time Between GAF Assessments (Days)", y="Change GAF (Post - Pre)")

Note: min value between GAF assessments is less than 1 yr (250 days). Will likely need to filter.

Looking at percent change in current GAF

#plot absolute change vs baseline GAF C
ggplot(proximal_gafc, aes(x=p.change_gafc, y=GAF_C_pre, color=dx)) +
  geom_point() +
  geom_text_repel(aes(label = bblid), max.overlaps = 30) +
  labs(title="Percent Change in Current GAF Over Time", x="Percent Change (Post - Pre)", y="Baseline GAF C")

#plot absolute change vs time between proximal GAF Cs
ggplot(proximal_gafc, aes(x=gafc_date_diff, y=p.change_gafc, color=dx)) +
  geom_point() +
  geom_text_repel(aes(label = bblid), max.overlaps = 30) +
  labs(title="Percent Change in Current GAF By Delta Time", x="Time Between GAF Assessments (Days)", y="Percent Change GAF (Post - Pre)")

Analysis ideas - look at improvers vs worseners? how does that this break down by diagnosis

GASR

Looking at GASR data included in diagnosis_long.csv

#drop missing gasr
gasr <- redcap %>% drop_na(GASR_CURRENT)%>%
  mutate(BBLID = as.factor(BBLID)) %>%
  select(1,2,6)

#join with 7T dates
gasr_date <- inner_join(gasr, dx_baseline, by = c("BBLID" = "bblid"))

#calc diff (negative values -> gasr before 7t)
gasr_date$diff_in_days = as.numeric(difftime(gasr_date$DODIAGNOSIS, gasr_date$first_7t_date, units = "days"))

# drop data more than 1 yr prior to scan
gasr_dx <- subset(gasr_date, diff_in_days >= -365)
min(gasr_dx$diff_in_days) #confirm datediff filtered
## [1] -354
#find subjects that have a baseline sips and sops up to 364 days post-7T 
clin364_gasr <-  gasr_dx %>% subset(diff_in_days <= 364, select = BBLID) %>% unique()
#find subjects that have a followup sips sops >365 days post-7T 
clin365_gasr <- gasr_dx %>% subset(diff_in_days >= 365, select = BBLID) %>% unique()
#filter df for appropriate timepoints
gasr_scan <- subset(gasr_dx, (BBLID %in% clin365_gasr$BBLID & BBLID %in% clin364_gasr$BBLID))

#plot for fun
ggplot(gasr_scan, aes(x=diff_in_days, y=BBLID, color=GASR_CURRENT)) +
  geom_point() +
  geom_vline(xintercept=365) +
  labs(title="GASR Current Relative to First 7T Scan", x="Days from 7T Scan")

length(unique(gasr_scan$BBLID))
## [1] 28
summary(gasr_scan)
##      BBLID     DODIAGNOSIS          GASR_CURRENT             dx    
##  105176 : 5   Min.   :2013-06-09   Min.   :21.00   healthy    :17  
##  83835  : 4   1st Qu.:2015-01-30   1st Qu.:50.00   other_dx   : 7  
##  109577 : 4   Median :2018-01-10   Median :63.00   psych_spect:49  
##  121407 : 4   Mean   :2017-04-18   Mean   :64.14                   
##  139272 : 4   3rd Qu.:2018-07-30   3rd Qu.:78.00                   
##  89367  : 3   Max.   :2021-10-06   Max.   :95.00                   
##  (Other):49                                                        
##   dx_source            dx_date           first_7t_date       
##  Length:73          Min.   :2014-01-08   Min.   :2014-03-07  
##  Class :character   1st Qu.:2014-10-30   1st Qu.:2014-07-22  
##  Mode  :character   Median :2015-01-19   Median :2014-11-07  
##                     Mean   :2015-08-10   Mean   :2015-08-03  
##                     3rd Qu.:2015-06-16   3rd Qu.:2015-08-26  
##                     Max.   :2018-09-21   Max.   :2018-11-30  
##                                                              
##     dx_diff         Diagnostic_Group                     Race    sex   
##  Min.   :-259.000   Other  : 9       Asian                 : 2   M:31  
##  1st Qu.: -71.000   PRO/CHR:42       Black/African American:56   F:42  
##  Median : -50.000   PSY    : 5       Mixed                 : 5         
##  Mean   :   6.863   TD/NC  :17       White                 :10         
##  3rd Qu.: 120.000                                                      
##  Max.   : 269.000                                                      
##                                                                        
##  ethnicity    age_scan      diff_in_days   
##  1: 3      Min.   :15.00   Min.   :-304.0  
##  2:70      1st Qu.:17.00   1st Qu.: -55.0  
##            Median :20.00   Median : 515.0  
##            Mean   :19.58   Mean   : 623.6  
##            3rd Qu.:21.00   3rd Qu.:1182.0  
##            Max.   :26.00   Max.   :2677.0  
## 

There are 28 (28) unique subjects with GASR current scores w/in 1 yr of 7T and after 1 yr follow-up. Looking at scores across time:

#plot longitudinal scores
ggplot(gasr_scan, aes(x=diff_in_days, y=GASR_CURRENT, color=BBLID)) +
  geom_line() +
  labs(title="GASR")

Look at absolute change from one timepoint to the next:

#calculate change in GAFC scores
gasr_change <- gasr_scan %>% 
  group_by(BBLID) %>% 
  arrange(diff_in_days) %>%
  mutate("gasr_diff"= GASR_CURRENT - lag(GASR_CURRENT))

summary(gasr_change)
##      BBLID     DODIAGNOSIS          GASR_CURRENT             dx    
##  105176 : 5   Min.   :2013-06-09   Min.   :21.00   healthy    :17  
##  83835  : 4   1st Qu.:2015-01-30   1st Qu.:50.00   other_dx   : 7  
##  109577 : 4   Median :2018-01-10   Median :63.00   psych_spect:49  
##  121407 : 4   Mean   :2017-04-18   Mean   :64.14                   
##  139272 : 4   3rd Qu.:2018-07-30   3rd Qu.:78.00                   
##  89367  : 3   Max.   :2021-10-06   Max.   :95.00                   
##  (Other):49                                                        
##   dx_source            dx_date           first_7t_date       
##  Length:73          Min.   :2014-01-08   Min.   :2014-03-07  
##  Class :character   1st Qu.:2014-10-30   1st Qu.:2014-07-22  
##  Mode  :character   Median :2015-01-19   Median :2014-11-07  
##                     Mean   :2015-08-10   Mean   :2015-08-03  
##                     3rd Qu.:2015-06-16   3rd Qu.:2015-08-26  
##                     Max.   :2018-09-21   Max.   :2018-11-30  
##                                                              
##     dx_diff         Diagnostic_Group                     Race    sex   
##  Min.   :-259.000   Other  : 9       Asian                 : 2   M:31  
##  1st Qu.: -71.000   PRO/CHR:42       Black/African American:56   F:42  
##  Median : -50.000   PSY    : 5       Mixed                 : 5         
##  Mean   :   6.863   TD/NC  :17       White                 :10         
##  3rd Qu.: 120.000                                                      
##  Max.   : 269.000                                                      
##                                                                        
##  ethnicity    age_scan      diff_in_days      gasr_diff       
##  1: 3      Min.   :15.00   Min.   :-304.0   Min.   :-38.0000  
##  2:70      1st Qu.:17.00   1st Qu.: -55.0   1st Qu.:-10.0000  
##            Median :20.00   Median : 515.0   Median :  0.0000  
##            Mean   :19.58   Mean   : 623.6   Mean   : -0.5333  
##            3rd Qu.:21.00   3rd Qu.:1182.0   3rd Qu.:  7.0000  
##            Max.   :26.00   Max.   :2677.0   Max.   : 35.0000  
##                                             NA's   :28

Negative change -> ppt’s score is getting lower (i.e. symptoms worstening). Look at absolute change immediately pre-post 7T

#define baseline - scan closest to 7T
#find smallest datediff that's >=365
post_gasr <- gasr_scan %>%
  select(1:3, 14) %>% #filter so we're not duplicating rows that wont change (demographics, 7t date, etc.)
  group_by(BBLID) %>%
  filter(diff_in_days >= 365) %>%
  arrange(BBLID, diff_in_days) %>%
  slice(c(1))%>% 
  rename_with( ~ paste0(.x, "_post")) %>%
  ungroup()

#find datediff that's <365 and closest to 0
pre_gasr <- gasr_scan %>%
  group_by(BBLID) %>%
  filter(diff_in_days < 365) %>%
  mutate(abs_diff_in_days=abs(diff_in_days)) %>%
  arrange(BBLID, abs_diff_in_days) %>%
  slice(c(1)) %>% 
  select(!c(abs_diff_in_days)) %>% 
  rename_at(c(2,3,14), ~ paste0(.x, "_pre")) %>%
  ungroup()

#new dataframe
proximal_gasr <- inner_join(pre_gasr, post_gasr, by = c("BBLID" = "BBLID_post")) %>%
  mutate(change_gasr = (GASR_CURRENT_post - GASR_CURRENT_pre), #absolute change
         p.change_gasr = (change_gasr/GASR_CURRENT_pre)*100) #percent change
#calculating days between gafc timepoints
proximal_gasr$gasr_date_diff = as.numeric(difftime(proximal_gasr$DODIAGNOSIS_post, proximal_gasr$DODIAGNOSIS_pre, units = "days"))

#plot change
ggplot(proximal_gasr, aes(x=change_gasr)) + 
  geom_bar(color="white", fill= "darkgreen") + 
  labs(title="Change in GASR Current Immediately Pre-Post 7T")

summary(proximal_gasr)
##      BBLID    DODIAGNOSIS_pre      GASR_CURRENT_pre           dx    
##  18093  : 1   Min.   :2014-01-08   Min.   :21.00    healthy    : 6  
##  82051  : 1   1st Qu.:2014-11-14   1st Qu.:49.50    other_dx   : 3  
##  83835  : 1   Median :2015-02-18   Median :56.00    psych_spect:19  
##  85369  : 1   Mean   :2015-10-13   Mean   :61.46                    
##  87225  : 1   3rd Qu.:2016-05-28   3rd Qu.:80.00                    
##  88760  : 1   Max.   :2018-09-21   Max.   :90.00                    
##  (Other):22                                                         
##   dx_source            dx_date           first_7t_date           dx_diff       
##  Length:28          Min.   :2014-01-08   Min.   :2014-03-07   Min.   :-259.00  
##  Class :character   1st Qu.:2014-11-14   1st Qu.:2014-07-22   1st Qu.: -90.25  
##  Mode  :character   Median :2015-02-18   Median :2015-01-21   Median : -53.00  
##                     Mean   :2015-10-13   Mean   :2015-10-25   Mean   : -12.29  
##                     3rd Qu.:2016-05-28   3rd Qu.:2016-08-19   3rd Qu.:  45.75  
##                     Max.   :2018-09-21   Max.   :2018-11-30   Max.   : 269.00  
##                                                                                
##  Diagnostic_Group                     Race    sex    ethnicity    age_scan    
##  Other  : 3       Asian                 : 1   M:12   1: 1      Min.   :15.00  
##  PRO/CHR:17       Black/African American:21   F:16   2:27      1st Qu.:17.00  
##  PSY    : 2       Mixed                 : 2                    Median :20.00  
##  TD/NC  : 6       White                 : 4                    Mean   :19.68  
##                                                                3rd Qu.:21.25  
##                                                                Max.   :26.00  
##                                                                               
##  diff_in_days_pre  DODIAGNOSIS_post     GASR_CURRENT_post diff_in_days_post
##  Min.   :-259.00   Min.   :2015-11-08   Min.   :21.00     Min.   : 387.0   
##  1st Qu.: -90.25   1st Qu.:2017-12-30   1st Qu.:52.50     1st Qu.: 648.2   
##  Median : -53.00   Median :2018-04-29   Median :65.50     Median : 934.5   
##  Mean   : -12.29   Mean   :2018-06-27   Mean   :64.25     Mean   : 976.6   
##  3rd Qu.:  45.75   3rd Qu.:2019-05-24   3rd Qu.:78.00     3rd Qu.:1280.0   
##  Max.   : 269.00   Max.   :2021-05-13   Max.   :90.00     Max.   :2489.0   
##                                                                            
##   change_gasr      p.change_gasr     gasr_date_diff  
##  Min.   :-38.000   Min.   :-43.243   Min.   : 385.0  
##  1st Qu.: -3.000   1st Qu.: -4.824   1st Qu.: 651.0  
##  Median :  1.500   Median :  2.136   Median :1079.5  
##  Mean   :  2.786   Mean   :  9.388   Mean   : 988.9  
##  3rd Qu.: 13.750   3rd Qu.: 26.434   3rd Qu.:1255.5  
##  Max.   : 35.000   Max.   : 81.395   Max.   :2220.0  
## 

Plotting GASR with dx

#plot absolute change vs baseline GASR
ggplot(proximal_gasr, aes(x=change_gasr, y=GASR_CURRENT_pre, color=dx)) +
  geom_point() +
  geom_text_repel(aes(label = BBLID), max.overlaps = 30) +
  labs(title="Change in GASR Over Time", x="Change (Post - Pre)", y="Baseline GASR")

#plot absolute change vs time between proximal GASR
ggplot(proximal_gasr, aes(x=gasr_date_diff, y=change_gasr, color=dx)) +
  geom_point() +
  geom_text_repel(aes(label = BBLID), max.overlaps = 30) +
  labs(title="Change in GASR By Delta Time", x="Time Between GASR Assessments (Days)", y="Change GASR (Post - Pre)")

Note: min value between GASR assessments is less than 1 yr (250 days). Will likely need to filter.

Looking at percent change in GASR

#plot absolute change vs baseline GASR
ggplot(proximal_gasr, aes(x=p.change_gasr, y=GASR_CURRENT_pre, color=dx)) +
  geom_point() +
  geom_text_repel(aes(label = BBLID), max.overlaps = 30) +
  labs(title="Percent Change in GASR Over Time", x="Percent Change (Post - Pre)", y="Baseline GASR")

#plot absolute change vs time between proximal GASR
ggplot(proximal_gasr, aes(x=gasr_date_diff, y=p.change_gasr, color=dx)) +
  geom_point() +
  geom_text_repel(aes(label = BBLID), max.overlaps = 30) +
  labs(title="Percent Change in GASR By Delta Time", x="Time Between GASR Assessments (Days)", y="Percent Change GASR (Post - Pre)")

One Big Clinical Dataset

Make one big clinical dataset that can be subsectioned to look at different constructs but will also easily show timepoints of subjects (eg on this date, ppt X had diagnosis, SIPS and PRIME). Merge additional clinical info (eg added by David) into our sample to potentially use as a predictor

Merging using dx_scan_date as our base, since it contains the 7t list & all diagnoses with 22q subj removed. * leave SYRP for 2nd to last since it has overlapping assessments with other dfs from Kosha (will need to re-rename prime cols) * merge demo & dx_scan_date last and not by date so info gets applied to ALL timepoints * calc diff_assess from 7t to date_assess

#list all subj so we can make sure all data is 7t subj with no 22q ~ then we can just use full joins
all_subj <- unique(dx_scan_date$bblid)

#clean cgas
summary(cgas)
##    redcapid             bblid        protocol_number         date           
##  Length:93          Min.   : 14528   Length:93          Min.   :2017-12-19  
##  Class :character   1st Qu.: 87646   Class :character   1st Qu.:2018-05-12  
##  Mode  :character   Median : 92155   Mode  :character   Median :2019-01-29  
##                     Mean   : 86345                      Mean   :2019-09-11  
##                     3rd Qu.:105176                      3rd Qu.:2021-02-25  
##                     Max.   :139272                      Max.   :2022-01-20  
##                                                                             
##       cgas         cgas_gaf001     cgas_gaf002     cgas_gaf003   
##  Min.   :0.0000   Min.   :45.00   Min.   :45.00   Min.   :60.00  
##  1st Qu.:0.5000   1st Qu.:71.00   1st Qu.:51.00   1st Qu.:66.00  
##  Median :1.0000   Median :75.00   Median :70.00   Median :69.00  
##  Mean   :0.7333   Mean   :75.78   Mean   :68.67   Mean   :72.75  
##  3rd Qu.:1.0000   3rd Qu.:78.00   3rd Qu.:76.00   3rd Qu.:75.75  
##  Max.   :1.0000   Max.   :99.00   Max.   :99.00   Max.   :93.00  
##  NA's   :78       NA's   :84      NA's   :84      NA's   :89     
##   cgas_review     cgas_review_assessor cgas_review_date   cgas_review_notes 
##  Min.   :0.0000   Length:93            Length:93          Length:93         
##  1st Qu.:0.0000   Class :character     Class :character   Class :character  
##  Median :0.0000   Mode  :character     Mode  :character   Mode  :character  
##  Mean   :0.1111                                                             
##  3rd Qu.:0.0000                                                             
##  Max.   :1.0000                                                             
##  NA's   :84                                                                 
##  cgas_review_assign cgas_review_status cgas_review_complete_date
##  Length:93          Min.   :3          Length:93                
##  Class :character   1st Qu.:3          Class :character         
##  Mode  :character   Median :3          Mode  :character         
##                     Mean   :3                                   
##                     3rd Qu.:3                                   
##                     Max.   :3                                   
##                     NA's   :92                                  
##  child_global_assessment_scale_cgas_complete admin_markup_redcapid_issue_ignore
##  Min.   :0.0000                              Mode:logical                      
##  1st Qu.:0.0000                              NA's:93                           
##  Median :0.0000                                                                
##  Mean   :0.2366                                                                
##  3rd Qu.:0.0000                                                                
##  Max.   :2.0000                                                                
##                                                                                
##  redcapid_check imp_redcapid  
##  Mode:logical   Mode:logical  
##  NA's:93        NA's:93       
##                               
##                               
##                               
##                               
## 
cgas_clean <- cgas %>%
  mutate(bblid=as.factor(bblid),
         date_assess=as.Date(date, format="%Y-%m-%d")) %>% #clean values
  select(2, 5:8,20) %>% # select cols
  subset(bblid %in% all_subj) #make sure correct subj list
#now sips clean & merge
sips_clean <- sips_calc %>% select(1, 7:27, 58:61) %>% #sips_calc already filtered for correct source (proband>collateral) for each timepoint, frequency calculated
  rename(date_assess = DOSIPS) %>%
  subset(bblid %in% all_subj) #make sure correct subj list
#merge
merge_1 <- full_join(sips_clean, cgas_clean, by = c("bblid", "date_assess"))

#prime clean and merge (will add rest of syrp later)
names(prime_p) #version selected for correct source (proband>collateral) for each timepoint, merged prime.csv and syrp primes
##  [1] "bblid"  "date"   "source" "sip003" "sip004" "sip005" "sip006" "sip007"
##  [9] "sip008" "sip009" "sip010" "sip011" "sip012" "sip013" "sip014"
prime_clean <- prime_p %>% rename(date_assess = date, prime_source=source)%>%
  subset(bblid %in% all_subj) #make sure correct subj list
#calculate prime_scores
prime_clean$count.5 <- apply(prime_clean[4:15], 1, function(x) length(which(x==5)))
prime_clean$count.6 <- apply(prime_clean[4:15], 1, function(x) length(which(x==6)))
prime_clean$prime_score <- rowSums(cbind(prime_clean$count.5, prime_clean$count.6))
prime_clean <- prime_clean %>% select(-count.5, -count.6) #remove count.5 and count.6
#merge
merge_2 <- full_join(merge_1, prime_clean, by = c("bblid", "date_assess"))

#clean dx and merge
all_dx_clean <- all_dx %>% rename(date_assess=dx_date) %>%
  mutate(bblid = as.factor(bblid)) %>%
  subset(bblid %in% all_subj) #make sure correct subj list
#merge
merge_3 <-full_join(merge_2, all_dx_clean, by = c("bblid", "date_assess"))

#clean syrp and merge ~ several proband assessments on same day, how to filter by admin_event?
syrp_clean <- syrp %>% rename(date_assess=date) %>%
  select(!c(13:37)) %>% #drop PRIME ~ maybe clean this up more later as well, lots of columns!
  subset(bblid %in% all_subj) #make sure correct subj list
#merge
merge_4 <- full_join(merge_3, syrp_clean, by = c("bblid", "date_assess"))
length(unique(merge_4$bblid)) 
## [1] 130
#merge demographics and first scan date since they'll get applied to all clinical timepoints
first7t <- first7t %>% mutate(BBLID=as.factor(BBLID))
merge_demo.scan <- full_join(demo, first7t, by = c("BBLID")) %>%
  subset(BBLID %in% all_subj) #make sure correct subj list

#merge demo & first scan into clinical data
all_data <- full_join(merge_demo.scan, merge_4, by = c("BBLID" = "bblid"))
length(unique(all_data$BBLID)) #should always have 132 subj
## [1] 132
#also making a version withough sryp because it has multiple proband assessments per day/complicates
all_data_nosyrp <- full_join(merge_demo.scan, merge_3, by = c("BBLID" = "bblid"))
length(unique(all_data_nosyrp$BBLID)) #should always have 132 subj
## [1] 132
#calculate datediff and filter for >1 yr pre-7t
all_data_nosyrp$date_diff = as.numeric(difftime(all_data_nosyrp$date_assess, all_data_nosyrp$first_7t_date, units = "days"))
all_data_nosyrp <- all_data_nosyrp %>% subset(date_diff >= -365)
length(unique(all_data_nosyrp$BBLID))
## [1] 122

There are 122 (122) subjects who have any kind of potentially usable clinical data (e.g. not from >1 yr prior to scan). `

Appendix: Missing Diagnoses

There are 24 (24) subj who have 7T scans but are missing diagnoses from redcap/PNC. Writing to csv alongside info on the scan protocol and clinical data David added to Box (syrp spreadsheets) to try to figure out where these individuals are from/where more clinical data could be found.

#list missing dx ppts
dx_missing_info <- dx_baseline %>% subset(is.na(dx)) %>%
  select(!c(3,4,6,8:10))

#merge in syrp data to see if david has other clinical info
extra_redcap <- syrp[1:11]
dx_missing_info <- left_join(dx_missing_info, extra_redcap, by="bblid") %>%
  mutate(protocol_num = as.factor(protocol_numbe),
         redcap_id = as.factor(redcap_id),
         protocol = as.factor(protocol),
         libi_id = as.factor(libi_id),
         location = as.factor(location),
         admin_proband = as.factor(admin_proband),
         admin_event = as.factor(admin_event))
summary(dx_missing_info)
##      bblid              dx     first_7t_date        Diagnostic_Group
##  19783  : 5   healthy    : 0   Min.   :2014-05-08   Other  : 9      
##  19971  : 5   other_dx   : 0   1st Qu.:2018-07-09   PRO/CHR: 0      
##  19974  : 5   psych_spect: 0   Median :2019-04-12   PSY    : 3      
##  19790  : 4   NA's       :75   Mean   :2018-11-28   TD/NC  :41      
##  19798  : 4                    3rd Qu.:2019-05-06   NA's   :22      
##  20523  : 4                    Max.   :2019-06-19                   
##  (Other):48                                                         
##     age_scan       redcap_id  admin_time_stamp          protocol 
##  Min.   :18.00   24     : 1   Length:75          7Tdep_pilot:31  
##  1st Qu.:19.00   25     : 1   Class :character   NARSAD     :21  
##  Median :21.00   37     : 1   Mode  :character   SYRP       : 6  
##  Mean   :21.74   38     : 1                      URF        :15  
##  3rd Qu.:23.00   39     : 1                      NA's       : 2  
##  Max.   :30.00   (Other):68                                      
##  NA's   :17      NA's   : 2                                      
##  protocol_numbe   libi_id        date              assessor        
##  Min.   :818621   NA's:75   Min.   :2018-02-28   Length:75         
##  1st Qu.:825834             1st Qu.:2018-06-25   Class :character  
##  Median :825940             Median :2019-04-15   Mode  :character  
##  Mean   :825981             Mean   :2018-12-30                     
##  3rd Qu.:825940             3rd Qu.:2019-05-03                     
##  Max.   :828621             Max.   :2019-06-19                     
##  NA's   :2                  NA's   :2                              
##         location  admin_proband   admin_event protocol_num
##  3535 Market: 1   p   :73       postscan:23   818621: 5   
##  Gates 10   :46   NA's: 2       prescan :28   825834:31   
##  HUP6       : 4                 selfr_1 :22   825940:21   
##  SC7T       :22                 NA's    : 2   828621:16   
##  NA's       : 2                               NA's  : 2   
##                                                           
## 
write.csv(dx_missing_info, "missing_dx.csv", row.names=FALSE, na="")